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We develop a flexible framework for modeling high-dimensional 
imaging data observed longitudinally. The approach decomposes the 
observed variability of repeatedly measured high-dimensional obser¬ 
vations into three additive components: a subject-specific imaging 
random intercept that quantifies the cross-sectional variability, a 
subject-specific imaging slope that quantifies the dynamic irreversible 
deformation over multiple realizations, and a subject-visit-specific 
imaging deviation that quantifies exchangeable effects between vis¬ 
its. The proposed method is very fast, scalable to studies including 
ultrahigh-dimensional data, and can easily be adapted to and exe¬ 
cuted on modest computing infrastructures. The method is applied 
to the longitudinal analysis of diffusion tensor imaging (DTI) data of 
the corpus callosum of multiple sclerosis (MS) subjects. The study 
includes 176 subjects observed at 466 visits. For each subject and 
visit the study contains a registered DTI scan of the corpus callosum 
at roughly 30,000 voxels. 

1. Introduction. An increasing number of longitudinal studies routinely 
acquire high-dimensional data, such as brain images or gene expression, 
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at multiple visits. This led to increased interest in generalizing standard 
models designed for longitudinal data analysis to the case when the observed 
data are massively multivariate. In this paper we propose to generalize the 
random intercept random slope mixed effects model to the case when instead 
of a scalar, one measures a high-dimensional object, such as a brain image. 
The proposed methods can be applied to longitudinal studies that include 
high-dimensional imaging observations without missing data that can be 
unfolded into a long vector. 

This paper is motivated by a study of multiple sclerosis (MS) patients 
[Reich et al. (2010)]. Multiple sclerosis is a degenerative disease of the cen¬ 
tral nervous system. A hallmark of MS is damage to and degeneration of 
the myelin sheaths that surround and insulate nerve fibers in the brain. 
Such damage results in sclerotic plaques that distort the flow of electrical 
impulses along the nerves to different parts of the body [Raine, McFarland 
and Hohlfeld (2008)]. MS also affects the neurons themselves and is associ¬ 
ated with accelerated brain atrophy. 

Our data are derived from a natural history study of 176 MS cases selected 
from a population with a wide spectrum of disease severity. Subjects were 
scanned over a 5-year period up to 10 times per subject, for a total of 
466 scans. The scans have been aligned (registered) using a 12 degrees of 
freedom transformation which accounts for rotation, translation, scaling, 
and shearing, but not for nonlinear deformation. In this study we focus on 
fractional anisotropy (FA), a useful voxel-level summary of diffusion tensor 
imaging (DTI), a type of structural Magnetic Resonance Imaging (MRI). FA 
is viewed as a measure of tissue integrity and is thought to be sensitive both 
to axon fiber density and myelination in white matter. It is measured on a 
scale between zero (isotropic diffusion characteristic of fluid-filled cavities) 
and one (anisotropic diffusion, characteristic of highly ordered white matter 
fiber bundles) [Mori (2007)]. 

The goal of the study was to quantify the location and size of longitudinal 
variability of FA along the corpus callosum. The primary region of interest 
(ROI) is a central block of the brain containing the corpus callosum, the 
major bundle of neural fibers connecting the left and right cerebral hemi¬ 
spheres. We weight FA at each voxel in the block with a probability for the 
voxel to be in the corpus callosum, where the probability is derived from an 
atlas formed using healthy-volunteer scans, and study longitudinal changes 
of weighted FAs in the blocks [Reich et al. (2010)]. Figure 1 displays the 
ROI that contains corpus callosum together with its relative location in a 
template brain. Each block is of size 38 x 72 x 11, indicating that there are 
38 sagittal, 72 coronal, and 11 axial slices, respectively. Figure 2 displays 
the 11 axial (horisontal) slices for one of the subjects from bottom to top. In 
this paper, we study the FA at every voxel of the blue blocks, which could 
be unfolded into an approximately 30,000 dimensional vector that contains 
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Fig. 1. The 3D-rendering of the region of interest (left), a blue block containing cor¬ 
pus callosum, and the template brain (right). Views: R = Right, L= Left, S = Superior, 
I = Interior, A = Anterior, P = Posterior. For the purposes of orientation, major venous 
structures are displayed in red in the right half of the template brain. The 3D-renderings 
are obtained using 3D-Slicer (2011) and 3D reconstructions of the anatomy from Pujol 
( 2010 ). 


the corresponding FA value at each entry. The variability of these images 
over multiple visits and subjects will be described by the combination of 
the following: (1) a subject-specific imaging random intercept that quanti¬ 
fies the cross-sectional variability; (2) a subject-specific imaging slope that 
quantifies the dynamic irreversible deformation over multiple visits; and (3) 



Fig. 2. The corpus callosum of a randomly chosen subject. Eleven axial slices are shown 
on the left. A histogram of the weighted FA values is on the right. Orientation: Interior 
(slice 0) to Superior (slice 10 ), Posterior (top) to Anterior (bottom), Right to Left. The 
pictures are obtained using MIPAV (2011). 
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a subject-visit-specific imaging deviation that quantifies exchangeable or re¬ 
versible visit-to-visit changes. 

High-dimensional data sets have motivated the statistical and imaging 
communities to develop new methodological approaches to data analysis. 
Successful modeling approaches involving wavelets and splines and adaptive 
kernels have been reported in the literature [Bigelow and Dunson (2009), 
Guo (2002), Hua et al. (2012), Li et al. (2011), Mohamed and Davatzikos 
(2004), Morris and Carroll (2006), Morris et al. (2011), Reiss and Ogden 
(2008, 2010), Reiss et al. (2005), Rodriguez, Dunson and Gelfand (2009), 
Yuan et al. (2014), Zhu, Brown and Morris (2011)]. A different direction 
of research has focused on principal component decompositions [Di et al. 
(2009), Crainiceanu, Staicu and Di (2009), Aston, Chiou and Evans (2010), 
Staicu, Crainiceanu and Carroll (2010), Greven et al. (2010), Di, Crainiceanu 
and Jank (2010), Zipunnikov et al. (2011a), Crainiceanu et al. (2011)], which 
led to several applications to imaging data [Shinohara et al. (2011), Gold¬ 
smith et al. (2011), Zipunnikov et al. (2011b)]. However, the high dimen¬ 
sionality of new data sets, the inherent complexity of sampling designs and 
data collection, and the diversity of new technological measurements raise 
multiple challenges that are currently unaddressed. 

Here we address the problem of exploring and analyzing populations of 
high-dimensional images at multiple visits using high-dimensional longitu¬ 
dinal functional principal components analysis (HD-LFPCA). The method 
decomposes the longitudinal imaging data into subject-specific, longitudi¬ 
nal subject-specific, and subject-visit-specific components. The dimension 
reduction for all components is done using principal components of the cor¬ 
responding covariance operators. Note that we are interested in imaging ap¬ 
plications and do not perform smoothing. However, in Section 3.4, we discuss 
how the proposed approach can be paired with smoothing and applied to 
high-dimensional functional data. The estimation and inferential methods 
are fast and can be performed on standard personal computers to analyze 
hundreds or thousands of high-dimensional images at multiple visits. This 
was achieved by the following combination of statistical and computational 
methods: (1) relying only on matrix block calculations and sequential access 
to memory to avoid loading very large data sets into the computer memory 
[see Demmel (1997) and Golub and Van Loan (1996) for a comprehensive re¬ 
view of partitioned matrix techniques]; (2) using SVD for matrices that have 
at least one dimension smaller than 10,000 [Zipunnikov et al. (2011b)]; (3) 
obtaining best linear unbiased predictors (BLUPs) of principal scores as a 
by-product of SVD of the data matrix; and (4) linking the high-dimensional 
space to a low-dimensional intrinsic space, which allows Karhunen-Loeve 
(KL) decompositions of covariance operators that cannot even be stored in 
the computer memory. Thus, the proposed methods are computationally 
linear in the dimension of images. 
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The rest of the manuscript is organized as follows. Section 2 reviews LF- 
PCA and discusses its limitation in high-dimensional settings. In Section 3 
we introduce HD-LFPCA, which provides a new statistical and computa¬ 
tional framework for LFPCA. This will circumvent the problems associated 
with LFPCA in high-dimensional settings. Simulation studies are provided 
in Section 4. Our methods are applied to the MS data in Section 5. Section 6 
concludes the paper with a discussion. 

2. Longitudinal FPCA. In this section we review the LFPCA framework 
introduced by Greven et al. (2010). We develop an estimation procedure 
based on the original one in Greven et al. (2010), but we heavily modify it 
to make it practical for applications to imaging high-dimensional data. We 
also present the major reasons why the original methods cannot be applied 
to high-dimensional data. 

2.1. Model. A brain imaging longitudinal study usually contains a sam¬ 
ple of images Yjj, where Yy is a recorded brain image of the zth subject, 
i = 1,..., I, scanned at times Ty . j = 1,..., Jj. The total number of subjects 
is denoted by I. The times T t] are subject specific. Different subjects could 
have a different number of visits (scans), Jj. The images are stored in 3- 
dimensional array structures of dimension p = p\ x P 2 x P 3 - For example, in 
the MS data p = 38 x 72 x 11 = 30,096. Note that our approach is not limited 
to the case when data are in a 3-dimensional array. Instead, it can be applied 
directly to any data structure where the voxels (or pixels, or locations, etc.) 
are the same across subjects and visits, and data can be unfolded into a 
vector. Following Greven et al. (2010), we consider the LFPCA model 

(1) Yjj-(u) = rj(v) + Xifliy) + X it i(v)Tij + 

where v denotes a voxel, p(v) is a fixed main effect, Aj^u) is the random 
imaging intercept for subject i, is the random imaging slope for 

subject i , is the time of visit j for subject i, Wij{y ) is the random 

subject/visit-specific imaging deviation. For simplicity, the main effect rj(-) 
does not depend on i and j. As discussed in Greven et al. (2010), model 
(1) and the more general model (8) in Section 3.2 are similar to functional 
models with uncorrelated [Guo (2002)] and correlated [Morris and Carroll 
(2006)] random functional effects. Instead of using smoothing splines and 
wavelets as in Guo (2002), Morris and Carroll (2006), our approach models 
the covariance structures using functional principal component analysis; we 
have found this approach to lead to the major computational advantages, 
as further discussed in Section 3. 

In the remainder of the paper, we unfold the data Yjj and represent 
it as a p x 1 dimensional vector containing the voxels in a particular or¬ 
der, where the order is preserved across all subjects and visits. We assume 
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that r](v) is a fixed surface/image and the latent (unobserved) bivariate 
process Xi(v) = (X' 0 (v),X} 1 (v)) / and process Wij(v) are square-integrable 
stochastic processes. We also assume that Xi(v) and Wij(v) are uncorre¬ 
lated. We denote by K x (ui, V 2 ) and K ll/ (y\,V 2 ) their covariance operators, 
respectively. Assuming that ~K. x (y\,v 2 ) and are continuous, we 

can use the standard Karhunen-Loeve expansions of the random processes 
[Karhunen (1947), Loeve (1978)] and represent Xi(v) = Y^Z=i ^k4>k ( v ) with 

0 * M and Wij(v) = JZZiC(v), where ($ and <j>Y 

are the eigenfunctions of the K A and K" operators, respectively. Note that 
K a and K" will be estimated by their sample counterparts on finite 2 p x 2 p 
and pxp grids, respectively. Hence, we can always make a working assump¬ 
tion of continuity for K A and K" . The LFPCA model becomes the mixed 
effects model 


OO OO 

, . Y ij(. v ) = v(v) + z ij Y ^kiv) + Y ( u )> 

k =1 1=1 

. & fcl , W ~ (0, 0 ; A*, A*, 0 ); (( ijh , <^ 2 ) ~ ( 0 , 0 ; A)f, , 0 ), 

where Z^ = (l,Tjj)' and ( 0 , 0 ; A^, A^, 0 )” indicates that a pair of vari¬ 
ables is uncorrelated with mean zero and variances A A j and A a 2 , respectively. 
Variances A a ’s are nonincreasing, that is, A^ > X x 2 if k\ < k- 2 - We do not 
require normality of the scores in the model. The only assumption is the 
existence of second order moments of the distribution of scores. In addition, 
the assumption that Xi(y) and Wij(y) are uncorrelated is ensured by the 
assumption that and {Cijl}Z\ are uncorrelated. Note that model 

(2) may be extended to include a more general vector of covariates Z ij. We 
discuss a general functional mixed model in Section 3.2. 

In practice, model 2 is projected onto the first Nx and N\y components 
of K a and K" a , respectively. Assuming that Nx and N\y are known, the 
model becomes 


( 3 ) 


N x 


N™ 


Yij{v) = rj(v) + Zjj- Y + Y («)> 


k =1 


1=1 


k ( 6 * 1 , Cite) ~ (0,0; X* , Xg , 0); (( ijh , ( ijh ) ~ (0,0; Ajf, Ag\ 0). 


The choice of the number of principal components Nx and N\y is discussed 
in Di et al. (2009), Greven et al. (2010). Typically, Nx and Nw are small 
and (3) provides significant dimension reduction of the family of images and 
their longitudinal dynamics. The main reason why the LFPCA model (3) 
cannot be fit when data are high dimensional is that the empirical covariance 
matrices K v and K" cannot be calculated, stored, or diagonalized. Indeed, 
in our case these operators would be 30,000 by 30,000 dimensional, which 
would have around 1 billion entries. In other applications these operators 
would be even bigger. 
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2.2. Estimation. Our estimation is based on the methods of moments 
(MoM) for pairwise quadratics E(Yij 1 YjJ.- 2 ). The computationally intensive 
part of fitting (3) is estimating the following massively multivariate model: 

N x N x N w 

= V + E ^ ik< ^k 0 + E + E 

k =1 fc=l Z=1 

(4) = v+* x '% + 

where r) = (r?(ui),..., t?(v p )), Yjj = {F^-(ui),..., (v p )} are p x 1 dimen¬ 
sional vectors, 0 Y1 , and <f>] ] are correspondingly vectorized eigen¬ 

vectors, T> ao = [</>f’°,..., 0^] and = [</>f. ■ •, <f>Nx\ are P x Ax di¬ 
mensional matrices, = [<f>Y, ■ ■ ■ > ‘P'nw ] is a p x N\y dimensional matrix, 
principal scores = (£a, • • ■ ■ liN x )' and Cij = (Ciji, • • •, CijJVy)' are uncorre¬ 
lated with diagonal covariance matrices E^^'f) = A A = diag(A A ,..., A a y ) 

and E(CijCij) = A u = diag(A^,..., A^), respectively. 

To obtain the eigenvectors and eigenvalues in model (4), the spectral de¬ 
compositions of K a and K" need to be constructed. The first Nx and 

Nw eigenvectors and eigenvalues are retained after this, that is, K A ~ 
<f>X A X^X' and K W w $,W A WqW^ where ^X = denotes a 

2 p x Nx matrix with orthonormal columns and " is a p x Nw matrix 
with orthonormal columns. 


Lemma 1. The MoM estimators of the covariance operators and the 
mean in (4) are unbiased and given by 



T>00 \ ' v y> u 1 

rv X 2_^ 1 ij 1 1 ij 2 'Hjiji > 

i,31,32 

lAOl _ y -yl l 2 

1 V l 1 ij2 r ‘ l ijij2 > 

i,31,32 


(5) 

K 10 — \ ' V,. y/ A > 3 
f'-X / , 1 *n 1 ij2 L ijih' 

i,ji ,32 

K 11 — Y- ■ V' h A 

^X / , 1 zji 1 ij 2 n ij\32 ’ 

i,ji,32 



I '<r w — \^ y Y- • h 5 - • 
^ ~ / , 1 1 i32 n mni 

i,j 1 ,32 

I Ji 

^=-EE Y b’ 

i =1 3= 1 


where Yjj = Y d — r), the 2 p x 2 p matrix K A = [K^K^;K^?:Ky], with 
Kx = E{& x,k £ i ($> x,8 £ i y} for k,s € {0,1 }, the weights h\ Jxn are elements 
of the 1th column of the matrix H mX 5 = F^FF') -1 , the matrix Fs xm has 
columns equal to fij lj2 = (1, T lj2 , T ljl , T in T in , J2 )', and m = Yll=i 


The proof of the lemma is given in the Appendix. The MoM estimators 
(5) define the symmetric matrices K A and K" . Identifiability of model (4) 
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requires that some subjects have more than two visits, that is, Jj > 3. Note 
that if one is only interested in estimating covariances, rj can be eliminated as 
a nuisance parameter by using MoMs for quadratics of differences if (Y i?1 — 
Yfcj 2 )(— Y kj 2 )' as in Shou et al. (2013). 

Estimating the covariance matrices is a crucial first step. However, con¬ 
structing and storing these matrices requires 0{p 2 ) calculations and 0{p 2 ) 
memory units. Even if it were possible to calculate and store these covari¬ 
ances, obtaining the spectral decompositions would be infeasible. Indeed, 
K x is a 2 px 2 p and K" is a px p dimensional matrix, which would require 
0(p 3 ) operations, making diagonalization infeasible for p> 10 4 . Therefore, 
LFPCA, which performs well when the functional dimensionality is moder¬ 
ate, fails in very high and ultrahigh-dimensional settings. 

In the next section we develop a methodology capable of handling lon¬ 
gitudinal models of very high dimensionality. The main reason why these 
methods work efficiently is because the intrinsic dimensionality of the model 
is controlled by the sample size of the study, which is much smaller compared 
to the number of voxels. The core part of the methodology is to carefully 
exploit this underlying low-dimensional space. 

3. HD-LFPCA. In this section we provide our statistical model and in¬ 
ferential methods. The main emphasis is on providing a new methodological 
approach with the ultimate goal of solving the intractable computational 
problems discussed in the previous section. 

3.1. Eigenanalysis. In Section 2 we established that the main computa¬ 
tional bottleneck for standard LFPCA of Greven et al. (2010) is construct¬ 
ing, storing, and decomposing the relevant covariance operators. In this sec¬ 
tion we propose an algorithm that allows efficient calculation of the eigenvec¬ 
tors and eigenvalues of these covariance operators without either calculating 
or storing the covariance operators. In addition, we demonstrate how all 
necessary calculations can be done using sequential access to data. One of 
the main assumptions of this section is that the sample size, n = Y2j=i is 
moderate, so calculations of order 0(n 3 ) are feasible. In Section 6 we discuss 
ways to extend our approach to situationsjvhen this assumption is violated. 

Write Y = (Yi,..., Y/), where Yj = (Yji,..., YjjJ is a centered p x Jj 
matrix and the column j, j = 1,..., Jj, ^contains the unfolded image for 
subject i at visit j. Note that the matrix Yj contains all the data for subject 
i with each column corresponding to a particular visit. The matrix Y is 
the p x n matrix obtained by column-binding the centered subject-specific 
data matrices Yj. Thus, if Yj = (Yji,..., Yjj.), then Y = (Yi,...,Y/). Our 
approach starts with constructing the SVD of the matrix Y: 

(6) Y = VS 1/2 U / . 
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Here, the matrix V ispxn dimensional with n orthonormal columns, S is a 
diagonal nx n dimensional matrix, and U is an n x n dimensional orthogo¬ 
nal matrix. Calculating the SVD of Y requires only a number of operations 
linear injthe number of parameters p. Indeed, consider the n x n symmetric 
matrix Y'Y with its spectral decomposition Y'Y = USUL Note that for 
high-dimensional p the matrix Y cannot be loaded into the memory. The 
solution is to partition it into L slices as Y' = [(Y 1 )')(Y 2 )'| • • • |(Y L )'], where 
the size of the Zth slice, Y l , is (p/L) x n and can be adapted to the avail¬ 
able computer memory and optimized tomedjrce implementation time. The 
matrix Y'Y is then calculated as ^2iLi{Y l )'Y l by streaming the individual 
blocks. This step calculates singular value decomposition of the p x n ma¬ 
trix Y. Note that for any permutation of components v, model (3) will be 
valid and the covariance structure imposed by the model can be recovered 
by doing the inverse permutation. If smoothing of the covariance matrix is 
desirable, then this step can be efficiently combined with Fast Covariance 
Estimation [FACE, Xiao et al. (2013)], a computationally efficient smoother 
of (low-rank) high-dimensional covariance matrices with p up to 100 , 000 . 

From the SVD ( 6 ) the pxn matrix V can be obtained as V = YUS~ 1//2 . 
The actual calculations can be performed on the slices of the partitioned 
matrix Y as V 1 = Y^US” 1 / 2 ,/ = 1,...,L. The concatenated slices [(V 1 )'! 
(V 2 )'| • • • |(V L )'] form the matrix of the left singular vectors V'. Therefore, 
the SVD ( 6 ) can be constructed with sequential access to the data Y with 
p- linear effort. 

After obtaining the SVD of Y, each image Y^ can be represented as 
Y ij = VS 1 / 2 ^-, where 1 % is a corresponding column of matrix U'. There¬ 
fore, the vectors Y^ differ only through the vector factors U \j of dimension 
n x 1. Comparing this SVD representation of Y,j with the right-hand side 
of (4), it follows that cross-sectional and longitudinal variability controlled 
by the principal scores Cij, and time variables T t j must be completely 
determined by the low-dimensional vectors U ij. This is the key observation 
which makes the approach feasible. Below, we provide more intuition behind 
our approach. The formal argument is presented in Lemma 2. 

First, we substitute the left-hand side of (4) with its SVD representation 
of Y ^ to get VS 1 / 2 !!^ = + 3>" Cij- Now we can multiply 

by V' both sides of the equation to get S 1/,2 Uij = V'<I> A ' 0 £. _) -TijV'<& x l £ i + 
V'<&" C tJ . If we denote A- Y, ° = V'$ X ’° of size n x Nx, A' 5 *-’ 1 = V /( I>' Y ’ 1 of 
size n x Nx, and A" = of size n x N\y, we obtain 

(7) S 1 / 2 ^ = A*’ 0 ^ + 7i,A V ' ; C; + A w Ciy 

Conditionally on the observed data, Y, models (4) and (7) are equivalent. 
Indeed, model (4) is a linear model for the n vectors Yjj’s. These vectors 


10 


V. ZIPUNNIKOV ET AL. 


span an (at most) n-dimensional linear subspace. Hence, the columns of 
the matrix V, the right singular vectors of Y, could be thought of as an 
orthonormal basis, while S 1//2 Ujj are the coordinates of Y ij in this basis. 
Multiplication by V' can be seen as a linear mapping from model (4) for the 
high-dimensional observed data Y [-s to model (7) for the low-dimensional 
data S 1//2 Ujj. Additionally, even though VV' / I p , the projection defined 
by V is lossless in the sense that model (4) can be recovered from model 

(7) using the identity VV'Y ^ = Y ij. Hence, model (7) has an “intrinsic” 
dimensionality induced by the study sample size, n. We can estimate the low¬ 
dimensional model (7) using the LFPCA methods described in Section 2. 
This step is now feasible, as it requires only 0(n 3 ) calculations. The formal 
result presented below shows that fitting model (7) is an essential step for 
getting the high-dimensional principal components in p-linear time. 

Lemma 2. The eigenvectors of the estimated covariance operators (5) 

can be calculated as <I> = VA y,0 ,4 A 1 = VA 1,1 ,# = VA" , where the 

matrices A' A, °, A^’ 1 , A" are obtained from fitting model (7). The estimated 

- x ~ w 

matrices of eigenvalues A and A are the same for both model (4) and 
model (7). 

The proof of the lemma is given in the Appendix. This result is a gener¬ 
alization of the HD-MFPCA result in Zipunnikov et al. (2011a), which was 
obtained in the case when there is no longitudinal component $ A ’ 1 . In the 
next section we provide more insights into the intrinsic model (7). 

3.2. The general functional mixed model. A natural way to generalize 
model (3) is to consider the following model: 

N x N x 

Y^ = r, + Z ijt o £ + Z ijA ]T + ■ ■ ■ 

fc=l fc=l 

( 8 ) 

N x _ N w 

+ ^ij,q > 

k =1 1=1 

where the (q+ l)-dimensional vector of covariates Z ^ \,..., Zjj_ q ) 

may include, for instance, polynomial terms of T t] and other covariates of 
interest. 

The fitting approach is essentially the same as the one described for the 
LFPCA model in Section 3.1. As before, the right singular vectors U ij con¬ 
tain the longitudinal information about (A, and covariates Z ij. The fol¬ 
lowing two results are direct generalizations of Lemmas 1 and 2. 
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Lemma 3. The MoM estimators of the covariance operators and the 
mean in (8) are unbiased and given by 


i>ks _ \v V' h X ~ 

Y ih n ij 


ijlj2 


I '{W _ \ y Y'- h ^ 


/ h (q+ L 2 +i 

ijlj2 


i,n,j2 

1 i Ji 


Ml J2 


^=-EE y *p 


t=lJ=1 


where Yij = — r) ; i/ie (g + l)p x (g + 1 )p block-matrix K v is composed of 

pxp matrices = E{& x >%(& x ’ s £ i )'} for k, s G {0,1,..., q}, the weights 
h\j 1 j 2 are elements of the Ith column of matrix H mx ((9+ l)2 + i)= F '( FF ') 1 , 
the matrix ~F^ q+ i^ 2 _ t _ 1 ) xm has columns equal to Uj\j 2 = (vec(Zjj 1 (g)Zjj 2 ),<5. (1 .,' 2 ) / ? 

and m = 221=1 J i- 


Lemma 4. The eigenvectors of the estimated covariance operators for 
(8) can be calculated as <!> ’ = VA A,fc , k = 0, 1,..., q, 4 = VA n/ , where 

the matrices A x,k , k = 0, 1,..., q, and A" are obtained from fitting the in¬ 
trinsic model 


(9) 


N x N x 

S 1/2 Ujj = Zij ,o ]T CifcAf- 13 + Z ijtl ]T ZikA*’ 1 + 

k=l k=l 


N x 




+ ^ij,q ^ ik ^k ,Q + ’ 


k =1 


1=1 


- X ~W 

The estimated matrices of eigenvalues A and A are the same for both 
model (8) and model (9). 


3.3. Estimation of principal scores. The principal scores are the coordi¬ 
nates of Y ij in the basis defined by the LFTCA model (8). In this section 
we propose an approach to calculating BLUP of the scores that is compu¬ 
tationally feasible for samples of high-resolution images. 

F irst, we introduce some notationMn Section 3.1 we showed that the SVD 
of the matrix Y can be written as Yj = VS 1 / 2 !!', where the n x Ji matrix 
U' corresponds to the subject i. Model (8) can be rewritten as 

(10) vec(Yj) = BjWj, 

where B, = [Bf iBf], Bf = Z i)0 ® $ X ’° + Z u ® $ X1 + • • • + Z hq <g> & x< i, 
Bf = I Jt <g> $ w , Z iifc = (Z iljk ,..., Z iJijk y, u>i = the subject level 

principal scores Ci = (Cu> ■ • •, Cu^Yi <8> is the Kronecker product of matrices, 
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and operation vec(-) stacks the columns of a matrix on top of each other. 
The following lemma contains the main result of this section; it shows how 
the estimated BLUPs can be calculated for the LFPCA model. 


Lemma 5. Under the general LFPCA model (8), the estimated best lin¬ 
ear unbiased predictor (EBLUP) of and Ci Is given by 

(11) Q*) =(B; : B ? ,)- 1 B'vec(Y J ), 

where all matrix factors on the right-hand side can be written in terms of 
the low-dimensional right singular vectors. 

The proof of the lemma is given in the Appendix. The EBLUPs calcu¬ 
lations are almost instantaneous, as the matrices involved in (11) are low- 
dimensional and do not depend on the dimension p. Section A.l in the 
Appendix briefly describes how the framework can be adapted to settings 
with tens or hundreds of thousands images. 


3.4. HF-LFPCA model with white noise. The original LFPCA model 
in Greven et al. (2010) was developed for functional observations and con¬ 
tained an additional white noise term. In this section, we show how the 
HD-LFPCA framework can be extended to accommodate such a term and 
how the extended model can be estimated. 

We now seek to fit the following model: 

N x N x 

Yij = V + Zij, o 0 + z ijA ]T + • • • 

k=l k=l 

( 12 ) 

N x _ N w 

+ z U,q X/ Zik&k’ 9 + X/ + £ v > 

k =1 1=1 

where is a p-dimensional white noise variable, that is, E(eij) = 0 P for 
any i.j and = a 2 5i 1 i 2 5j 1 j 2 I p . The white noise process £%j(v) is 

assumed to be uncorrelated with processes Xi(v) and Wij(v). 

Lemma 3 applied to (12) shows that +1 is 

an unbiased estimator of +cr 2 Ip. To estimate a 2 in a functional case, we 
can follow the method in Greven et al. (2010): (i) drop the diagonal elements 
of K^ 2 and use a bivariate smoother to get K^\, (ii) calculate an estimator 
a 2 = max{(tr(Iw 2 ) — 0}. To make this approach feasible in very 

high-dimensional settings (p~ 100,000), we can use the fast covariance es¬ 
timation (FACE) developed in Xiao et al. (2013), a bivariate smoother that 
scales up linearly with respect to p and preserves the low dimensionality of 
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the estimated covariance operator. Thus, HD-LFPCA remains feasible after 
smoothing by FACE. 

When the observations Y^-’s are nonfunctional, the off-diagonal smooth¬ 
ing approach cannot be used. In this case, if one assumes that model (12) 
is low-rank, then a 2 can be estimated as (tr (K^) — Jfk=i ^w)- 

Bayesian model selection approaches that estimate both the rank of PCA 
models and variance a 2 are discussed in Everson and Roberts (2000) and 
Minka (2000). 


4. Simulations. In this section three simulation studies are used to ex¬ 
plore the properties of our proposed methods. In the first study, we replicate 
several simulation scenarios in Greven et al. (2010) for functional curves, but 
we focus on using a number of parameters up to two orders of magnitude 
larger than the ones in the original scenarios. This increase in dimensionality 
could not be handled by the original LFPCA approach. In the second study, 
we explore how methods recover 3D spatial bases when the approach of 
Greven et al. (2010) cannot be implemented. In the third study, we replicate 
the unbalanced design in and use time variable Tjj from our DTI application 
and generate data using principal components estimated in Section 5. For 
each scenario, we simulated 100 data sets. All three studies were run on a 
four core i7-2.67 GHz PC with 6 Gb of RAM memory using Matlab 2010a. 
The software is available upon request. 

First scenario (ID, functional curves ). We follow Greven et al. (2010) 
and generate data as follows: 


N x N x N w 

Y ij(v) = + T ij^2&k ( t>k' 1 ( V ) + 'Yh^i3l ( t ) Y ( v )+ £ ij{ v ) 

k=1 k=1 1=1 

ue V, 


iik i -~-0.5N(- 


+ 0.51V 


k C iji i ~' 0.51V (-0^/2, A Y/2) + 0.51V (^A^/2, A^/ 2 ), 

where ' 0.51V(- /2, A*/2) + 0.5 N{^ A*/2, A* /2) means that the 

scores are simulated from a mixture of two normals, A^/2, Aj^/2) 

and lV (v ^/2 ,Af/2 ) with equal probabilities; a similar notation holds for 
Qji- The scores s and Ciji’ s are mutually independent. We set I = 100, 
Ji = 4, i = 1,..., I, and the number of eigenfunctions Nx = Nw = 4. The 
true eigenvalues are the same, X* = Ajt.' = 0.5 fc_1 , k = 1,2,3,4. The orthog¬ 
onal but not mutually orthogonal bases were 




= \/2/3sin(2 


TTV 


b Y \v) = 1/2, 
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4>f'°(v) = v / 2/3cos(27r?;), ^’ 1 (u) = V3(2v — l)/2, ($ = \/4/3</>f’ 0 , 

03 L ’°(' u ) = \/2/3sin(47rt>), 4>3 ,l ( v ) = V^Gu 2 — 6t> + l)/2, 

^ = vW’ 0 , 

0^’°( u ) = \/2/3cos(47rt;), ^^{y) = \/7(20u 3 — 30u 2 + 12v — l)/2, 

which are measured on a regular grid of p equidistant points in the inter¬ 
val [0, 1] . To explore scalability, we consider several grids with an increasing 
number of sampling points, p, equal to 750,3000, 12,000,24,000,48,000, and 
96,000. Note that a brute-force extension of the standard LFPCA would 
be at the edge of feasibility for such a large p. For each i, the first time 
Tii is generated from the uniform distribution over interval (0,1) denoted 
by £7(0,1). Then differences (T^+i — Tjj ) are also generated from £7(0,1) 
for 1 < j < 3. The times T) i ,..., T t 4 are normalized to have sample mean 
zero and variance one. Although no measurement noise is assumed in model 
(3), we simulate data that also contains white noise, £ij(v). The purpose 
of this is twofold. First, it is of interest to explore how the presence of 
white noise affects the performance of methods which do not model it ex¬ 
plicitly. Second, the choice of the eigenfunctions in the original simulation 
scenario of Greven et al. (2010) makes the estimation problem ill-posed if 
data does not contain white noise. The white noise £ij(v ) is assumed to be 
i.i.d. 7V(0,<r 2 ) for each i.j, v and independent of all other latent processes. 
To evaluate different signal-to-noise ratios, we consider values of a 2 equal to 
0.0001,0.0005,0.001,0.005,0.01. Note that we normalized each of the data 
generating eigenvectors to have norm one. Thus, the signal-to-noise ratio, 
(Ylt=i + Ylt=i ^)/(P a2 )i ranges from 50 (for p = 750 and a 2 =0.0001) 
to 0.004 (for p = 96,000 and a 2 = 0.01). 

Table 1 and Tables 1 and 2 in the online supplement [Zipunnikov et al. 
(2014)] report the average L 2 distances between the estimated and true 
eigenvectors for Xi t o(v), Xi^i(v), and Wij(v), respectively. The averages are 
calculated based on 100 simulated data sets for each (p,cr 2 ) combination. 
Standard deviations are shown in brackets. Three trends are obvious: (i) 
eigenvectors with larger eigenvalues are estimated with higher accuracy, (ii) 
larger white noise corresponds to a decreasing accuracy, (iii) for identical 
levels of white noise, accuracy goes down when the dimension p goes up. 
Similar trends are observed for average distances between estimated and 
true eigenvalues reported in Tables 3 and 4. These trends follow from the 
fact that for any fixed cr 2 , the signal-to-noise ratio decreases with increasing 
p and the performance of the approach quickly deteriorates once the signal- 
to-noise ratio becomes smaller than one. 
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Table 1 

Based on 100 simulated data sets, average distances between estimated and true 
eigenvectors of Xi : o{v); standard deviations are given in parentheses 






II /X,0 ?x,o 
1103 03 


(p. °‘ 2 ) 


(750, le-04) 
(750, 5e-04) 
(750, 0.001) 
(750, 0.005) 
(750, 0.01) 
(3000, le-04) 
(3000, 5e-04) 
(3000, 0.001) 
(3000, 0.005) 
(3000, 0.01) 
(12,000, le-04) 
(12,000, 5e-04) 
( 12 , 000 , 0 . 001 ) 
(12,000, 0.005) 
( 12 , 000 , 0 . 01 ) 

(24,000, le-04) 
(24,000, 5e-04) 
(24,000, 0.001) 
(24,000, 0.005) 
(24,000, 0.01) 

(48,000, le-04) 
(48,000, 5e-04) 
(48,000, 0.001) 
(48,000, 0.005) 
(48,000, 0.01) 
(96,000, le-04) 
(96,000, 5e-04) 
(96,000, 0.001) 
(96,000, 0.005) 
(96,000, 0.01) 


0.034 (0.048) 
0.031 (0.031) 
0.035 (0.039) 
0.035 (0.039) 
0.045 (0.036) 
0.031 (0.028) 
0.037 (0.032) 
0.031 (0.027) 
0.058 (0.035) 
0.073 (0.028) 
0.031 (0.028) 
0.041 (0.036) 
0.047 (0.04) 
0.112 (0.032) 
0.175 (0.031) 

0.035 (0.032) 
0.055 (0.045) 
0.07 (0.038) 
0.183 (0.049) 
0.295 (0.043) 

0.046 (0.068) 
0.073 (0.035) 
0.105 (0.051) 
0.307 (0.08) 
0.458 (0.084) 
0.045 (0.033) 
0.116 (0.081) 
0.188 (0.089) 
0.457 (0.065) 
0.662 (0.105) 


0.07 (0.069) 
0.055 (0.051) 
0.062 (0.054) 
0.072 (0.062) 
0.079 (0.054) 
0.064 (0.118) 
0.065 (0.048) 
0.06 (0.044) 
0.106 (0.058) 
0.142 (0.048) 

0.062 (0.048) 
0.078 (0.05) 
0.083 (0.054) 
0.217 (0.064) 
0.338 (0.093) 

0.066 (0.049) 
0.097 (0.061) 
0.125 (0.047) 
0.348 (0.097) 
0.518 (0.117) 

0.076 (0.067) 
0.13 (0.056) 
0.183 (0.065) 
0.532 (0.151) 
0.712 (0.1) 

0.087 (0.059) 
0.194 (0.094) 
0.32 (0.121) 
0.707 (0.107) 
0.926 (0.103) 


0.074 (0.053) 
0.084 (0.097) 
0.078 (0.059) 
0.096 (0.063) 
0.129 (0.102) 

0.09 (0.13) 
0.077 (0.06) 
0.087 (0.062) 
0.171 (0.09) 
0.236 (0.074) 
0.077 (0.056) 
0.121 (0.069) 
0.164 (0.114) 
0.44 (0.216) 
0.554 (0.132) 
0.09 (0.141) 
0.146 (0.09) 
0.23 (0.167) 
0.622 (0.208) 
0.742 (0.102) 

0.103 (0.059) 
0.234 (0.1) 
0.407 (0.23) 
0.824 (0.208) 
0.938 (0.074) 
0.146 (0.103) 
0.431 (0.268) 
0.787 (0.339) 
0.954 (0.125) 
1.116 (0.075) 


II ,X,0 IX ,0 


0.081 (0.07) 
0.112 (0.151) 
0.139 (0.206) 
0.159 (0.084) 
0.234 (0.103) 

0.109 (0.126) 
0.14 (0.136) 
0.131 (0.07) 
0.324 (0.096) 
0.508 (0.072) 
0.134 (0.165) 
0.201 (0.081) 
0.295 (0.118) 
0.758 (0.153) 
0.987 (0.071) 

0.146 (0.173) 
0.266 (0.098) 
0.43 (0.15) 
0.998 (0.11) 
1.184 (0.07) 

0.175 (0.122) 
0.437 (0.099) 
0.695 (0.192) 
1.19 (0.086) 
1.186 (0.126) 

0.246 (0.107) 
0.721 (0.218) 
1.062 (0.216) 
1.298 (0.074) 
1.143 (0.153) 


Figure 1 in the online supplement [Zipunnikov et al. (2014)] displays the 
true and estimated eigenfunctions for the case when p = 12,000 and a 2 = 
0.01 2 and shows the complete agreement with Figure 2 in Greven et al. 
(2010). The boxplots of the estimated eigenvalues are displayed in Figure 3. 
In Figure 4, panels one and three report the boxplots of and panels two and 
four display the medians and quantiles of the distribution of the normalized 


estimated scores, (^-^(t)/y A-^" and (C«?7 ~ Ciji)/y respectively. This 
indicates that the estimation procedures provides unbiased estimates. 
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1 2 3 4 1 2 3 4 


Fig. 3. Boxplots of the normalized estimated eigenvalues for process X i(v), 
(Aj5 — (left box), and the normalized estimated eigenvalues for process Wij{v), 

(A; — A; )/A; (right box), based on scenario 1 with 100 replications. The zero is shown 
by the solid black line. 


Second, scenario (3D). Data sets in this study replicate the 3D ROI blocks 
from the DTI MS data set. We simulated 100 data sets from the model 

N x N x N w 

Y ij(v) = X ^4>k’°( V ) + X^XW + X ( v )> v G 

* fc=l k=1 1=1 

Aik'-'NiO, A*) and (iji i ~' N(0, A^), 

where V = [1,38] x [1,72] x [1,11]. Eigenimages an d are 

displayed in Figure 5. The images in this scenario can be thought of as 3D 
images with voxel intensities on the [0,1] scale. The voxels within each sub¬ 
block (eigenimage) are set to 1 and outside voxels are set to 0. There are 
four blue and red sub-blocks corresponding to cj) k ’ and </> fc ’ , respectively. 
The eigenfunctions closest to the anterior side of the brain (labeled A in 
Figure 1) are cj) 1 ’ and (j) 1 ’ , which have the strongest signal proportional to 
the largest eigenvalue (variance), A^. The eigenvectors that are progressively 
closer to the posterior part of the brain (labeled P) correspond to smaller 
eigenvalues represented as lighter shades of blue and red, respectively. The 
sub-blocks closest to the P have the smallest signal, which is proportional 



Fig. 4. The left two panels show the distribution of the normalized estimated scores of 
process Xi[v), (£,ik — £*fc)/\/A*. Boxplots are given in the left column. The right col¬ 
umn shows the medians (black marker), 5% and 95% quantiles (blue markers), and 0.5% 
and 99.5% quantiles (red markers). Similarly, the distribution of the normalized estimated 
scores of process Wij(v), (fyj — provided at the right two panels. 
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Fig. 5. 3D eigenimages of the 2nd simulation scenario. From left to right: are in 

blue, (jy)' 1 are in red, cpY are green, the most right one shows the overlap of all eigenim¬ 
ages. Views: R = Right, L = Left, S = Superior, 1= Interior, A = Anterior, P = Posterior. 
The 3D-renderings are obtained using 3D-Slicer (2011). 


to Xf. The eigenimages (ftu shown in green are ordered the same way. Note 
that are uncorrelated with g. However, both and (jtf are corre- 
lated with the 4 > k ’ ’s describing the random slope Xi } i(v). We assume that 
I = 150, Ji = 6, i = 1,..., I, and the true eigenvalues X k = 0.5 fc-1 ,/c = 1,2,3, 
and A'' = 0.5 i-1 ,Z = 1,2. The times Tjj were generated as in the first simu¬ 
lation scenario. To apply HD-LFPCA, we unfold each image Yj j and obtain 
vectors of size p = 38 x 72 xll = 30,096. The entire simulation study took 
20 minutes or approximately 12 seconds per data set. Figures 4, 5 and 6 in 
the online supplement [Zipunnikov et al. (2014)] display the medians of the 
estimated eigenimages and the voxelwise 5th and 95th percentile images, 
respectively. All axial slices, or 2 slices in a x-y-z coordinate system, are 
the same. Therefore, we display only one 2 -slice, which is representative of 
the entire 3D image. To obtain a grayscale image with voxel values in the 
[0,1] interval, each estimated eigenvector, <f> = (</>i,..., cj) p ), was normalized 
as (f> -A- {cj) — min s cf> s )/(max s (j) s — min s cj) s ). Figure 4 in the online supplement 
[Zipunnikov et al. (2014)] displays the voxelwise medians of the estimator, 
indicating that the method recovers the spatial configuration of both bases. 
The 5-percentile and 95-percentile images are displayed in Figures 5 and 6 
in the online supplement [Zipunnikov et al. (2014)], respectively. Overall, 
the original pattern is recovered with some small distortions most likely due 
to the correlation between bases (please note the light gray patches). 

The boxplots of the estimated normalized eigenvalues (A^ — X k )/X k and 
ar - \Y)/\r are displayed in Figure 2 in the online supplement [Zipun¬ 
nikov et al. (2014)]. The eigenvalues are estimated consistently. However, 
in 6 out of 100 cases (extreme values shown in red), the estimation proce¬ 
dure did not distinguish well between and (f> I' • This is probably due the 
relatively low signal. 

The boxplots of the estimated eigenscores are displayed in Figure 3 in 
the online supplement [Zipunnikov et al. (2014)]. In this scenario, the total 
number of the estimated scores ^ is 15,000 for each k and there are 90,000 
estimated scores Ciji f° r each l. The distributions of the normalized estimated 
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scores (£jfc — £jfc)/y Aj* and (<^7 — CijO/y Y are displayed in the first and 
third panels of Figure 3 in the online supplement [Zipunnikov et al. (2014)], 
respectively. The spread of the distributions increases as the signal-to-noise 
ratio decreases. The second and fourth panels of Figure 3 in the online 
supplement [Zipunnikov et al. (2014)] display the medians, 0.5%, 5%, 95%, 
and 99.5% quantiles of the distribution of the normalized estimated scores. 

Third scenario (3D, empirical basis). We generate data using the first 
ten principal components estimated in Section 5. We replicated the un¬ 
balanced design of the MS study and used the same time variable T^’s. 
The principal scores Gfc and Qjk were simulated as in Scenario 1 with 
= Ajf = 0.5 fc_1 ,fe = 1,..., 10. The white noise variance a 2 was set to 
10 . Thus, SNR is equal to 1.32. The results are reported in Table 5 in 
the online supplement [Zipunnikov et al. (2014)]. The average distances be¬ 
tween estimated and true eigenvectors for X t (v) and Wij(v ) are calculated 
based on 100 simulated data sets. Principal components and principal scores 
become less accurate as the signal-to-noise gets smaller. 


5. Longitudinal analysis of brain fractional anisotropy in MS patients. 

In this section we apply HD-LFPCA to the DTI images of MS patients. The 
study population included individuals with no, mild, moderate, and severe 
disability. Over the follow-up period (as long as 5 years in some cases), 
there was little change in the median disability level of the cohort. Cohort 
characteristics are reported in Table 7 in the online supplement [Zipunnikov 
et al. (2014)]. The scans have been aligned using a 12 degrees of freedom 
transformation, meaning that we accounted for rotation, translation, scaling, 
and shearing, but not for nonlinear deformation. As described in Section 1 , 
the primary region of interest is a central block of the brain of size 38 x 
72 x 11 displayed in Figure 1. We weighted each voxel in the block with a 
probability for the voxel to be in the corpus callosum and study longitudinal 
changes of weighted voxels in the blocks [Reich et al. (2010)]. Probabilities 
less than 0.05 were set to zero. Below we model longitudinal variability of 
the weighted FA at every voxel of the blocks. The entire analysis performed 
in Matlab 2010a took only 3 seconds on a PC with a quad core i7-2.67 GHz 
processor and 6 Gb of RAM memory. First, we unfolded each block into 
a 30,096 dimensional vector that contained the corresponding weighted FA 
values. In addition to high dimensionality, another difficulty of analyzing this 
study was the unbalanced distribution of scans across subjects (see Table 6 
in the online supplement [Zipunnikov et al. (2014)]); this is a typical problem 
in natural history studies. After forming the data matrix Y, we estimated 
the overall mean as r) = ^ )Ci=i X^=i ^ij an d de-meaned the data. The 
estimated mean is shown in Figure 7 in the online supplement [Zipunnikov 
et al. (2014)]. The mean image across subjects and visits indicates a shape 
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Table 2 

Model 1 (Tij change): Cumulative variability explained by the first 10 eigenimages 


k 


tf' 1 

/ W 

Cumulative 

1 

22.13 

0.08 

7.12 

29.33 

2 

10.66 

0.11 

3.20 

43.29 

3 

5.99 

0.13 

2.04 

51.44 

4 

4.84 

0.08 

1.44 

57.80 

5 

2.80 

0.06 

0.90 

61.56 

6 

2.39 

0.07 

0.83 

64.85 

7 

1.94 

0.10 

0.63 

67.52 

8 

1.72 

0.08 

0.50 

69.82 

9 

1.55 

0.05 

0.45 

71.86 

10 

1.20 

0.05 

0.39 

73.50 


55.20 

0.80 

17.50 

73.50 


characterized by our scientific collaborators as a “standard corpus callosum 
template.” 

Model 1: First, we start by fitting a random intercept and random slope 
model (1). To enable comparison of the variability explained by processes 
Xi(v) and Wij(v), we followed the normalization procedure in Section 3.4 in 
Greven et al. (2010): Tjj ’s were normalized to have sample mean zero and 
sample variance one. The estimated covariance matrices are not necessarily 
nonnegative definite. Indeed, we have obtained small negative eigenvalues 
of the covariance operators K A and K" . Following Hall, Muller and Yao 
(2008), all the negative eigenvalues were set to zero. The total variation was 
decomposed into the “subject-specific” part modeled by process Xi and the 
“exchangeable visit-to-visit” part modeled by the process Wij. Most of the 
total variability, 70.8%, is explained by Xi (subject-specific variability) with 
the trace of K A = 122.53, while 29.2% is explained by Wij (exchangeable 
visit-to-visit variability) with the trace of K" = 50.47. Two major contribu¬ 
tions of our approach are to separate the processes Xi and Wij and quantify 
their corresponding contributions to the total variability. 

Table 2 reports the percentage explained by the first 10 eigenimages. 
The first 10 random intercept eigenimages explain roughly 55% of the total 
variability, while the effect of the random slope is accounting for only 0.80% 
of the total variability. The exchangeable variability captured by Wij(v) 
accounts for 17.5% of the total variation. 

The first three estimated random intercept and slope eigenimages are 
shown in pairs in Figures 6, 7, and in 8, 9, 10, 11 in the online supplement 
[Zipunnikov et al. (2014)], respectively. Figures 12, 13 and 14 in the online 
supplement [Zipunnikov et al. (2014)] display the first three eigenimages 
of the exchangeable measurement error process Wij(v). Each eigenimage is 
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Fig. 6. Eleven slices of A histogram of the voxel intensities is on the right. The 
pictures are obtained using MIPAV (2011). 


accompanied with the histogram of its voxel values. Recall that the eigen- 
images were obtained by folding the unit length eigenvectors of p ~ 3 • 10 4 
voxels. Therefore, each voxel is represented by a small value. For principal 
scores, negative and positive voxel values correspond to opposite loadings 
(directions) of variation. Each histogram has a peak at zero due to the exis¬ 
tence of the threshold for the probability maps indicating if a voxel is in the 
corpus callosum. This peak is a convenient visual divider of the color spec¬ 
trum into the loading specific colors. Because of the sign invariance of the 
SVD, the separation between positive and negative loadings is comparable 
only within the same eigenimage. However, the loadings of the random inter¬ 
cept and slope within an eigenimage of the process Xi(v ) can be compared 
as they share the same principal score. This allows us to contrast the time 
invariant random intercept with the longitudinal random slope and, thus, to 
localize regions that exhibit the largest longitudinal variability. This could 
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Fig. 7. Eleven slices of (ff 1 . A histogram of the voxel intensities is on the right. The 
pictures are obtained using MIPAV (2011). 
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be used to analyze the longitudinal changes of brain imaging in a particular 
disease or to help generate new scientific hypotheses. 

We now interpret the random intercept and slope parts of the eigenimages 
obtained for the MS data. Figures 6 and 7 show the random intercept and 
slope parts of the first eigenimage (j)f , respectively. The negatively loaded 
voxels of the random intercept, 4>i’°, essentially compose the entire corpus 
callosum. This indicates an overall shift in the mean FA of the corpus callo¬ 
sum. This is expected and is a widely observed empirical feature of principal 
components. The random slope part, (pf’ 1 , has both positively and nega¬ 
tively loaded areas in the corpus callosum. The areas colored in blue shades 
share the sign of the random intercept whereas the red shades have 

the opposite sign. The extreme colors of the spectrum of <f>\ ’ show a clear 
separation into negative and positive loadings, especially accentuated in the 
splenium (posterior) and the genu (anterior) areas of the corpus callosum; 
please note the upper and lower areas in panels 0 through 5 of Figure 7. This 
implies that a subject with a positive first component score £n > 0 would 
tend to have a smaller mean FA over the entire corpus callosum and the 
FA would tend to decrease with time in the negatively loaded parts of the 
splenium. The reverse will be true for a subject with a negative score £,;i. 
The other two eigenimages of X j (u) and eigenimages of ( v ) are discussed 
in the online supplement [Zipunnikov et al. (2014)]. 

Next, we explored whether the deviation process Wij(v ) depends on MS 
severity by analyzing the corresponding eigenscores. To do this, we di¬ 
vided subjects according to their MS type into three subgroups: relapsing- 
remitting (RR, 102 subjects), secondary progressive (SP, 40 subjects), and 
primary progressive (PP, 25 subjects). For each of the first ten eigenimages, 
we formally tested whether there are differences between the distributions 
of the scores of the three groups using the f-test and the Mann-Whitney- 
Wilcoxon-rank test for equality of means and the Kolmogorov-Smirnov test 
for equality of distributions. For the first eigenimage, the scores in the SP 
group have been significantly different from both those in RR and PP groups 
(jp -values <0.005 for all three tests). For the second eigenimage, scores in 
the RR group were significantly different from both SP and PP (p-values 
<0.01 for all three tests). The two left images of Figure 8 display the group 
beanplots of the scores for the first eigenimage and the second eigenimage 
of Wij(v ), respectively. 

In addition to MS type, the EDSS scores were recorded at each visit. We 
divided subjects into two groups according to their EDSS score: (i) smaller 
than 5 and (ii) larger than or equal to 5. As with MS type, we have conducted 
tests for the equality of distributions of the eigenscores of these two groups 
for all ten eigenimages. For eigenimages one and two, the distributions of 
eigenscores have been found to be significantly different (p -values <0.001 for 
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Fig. 8. Model 1: Group beanplots according to MS type (top) and according to EDSS 
score (bottom). 


all three tests). The two right images on Figure 8 display group beanplots 
of the scores for the first eigenimage and the second eigenimage of Wij(v), 
respectively. 

We have also conducted a standard analysis based on the scalar mean 
FA over the CC for each subject/visit and fitted a scalar random inter¬ 
cept/random slope model. In this model, the random intercept explains 
roughly 94% of the total variation of the mean FAs. Figure 15 in the online 
supplement [Zipunnikov et al. (2014)] displays beanplots of the estimated 
random intercepts stratified by EDDS score and MS type. For both cases 
there was a statistically significant difference between the distributions of the 
random intercepts (EDSS: p-values < 0.001; MS-type, SP vs. RR and PP, 
p -values < 0.002, for all three tests). Similar tests for the distributions of the 
random slopes did not identify statistically significant differences between 
groups. We conclude that this simple model agrees with the full HD-LFPCA 
mode, though the multivariate model provides a detailed decomposition of 
the total FA variation together with localization variability in the original 
3D-space. 
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Table 3 

Model 2 (Zij change): Cumulative variability explained by the first 10 eigenimages 


k 



iW 

<t>k 

Cumulative 

1 

17.79 

0.42 

5.59 

23.80 

2 

0.53 

8.46 

1.99 

34.78 

3 

6.92 

0.39 

1.55 

43.64 

4 

4.68 

0.76 

1.05 

50.13 

5 

3.02 

0.52 

0.80 

54.46 

6 

2.44 

0.29 

0.69 

57.88 

7 

1.63 

0.77 

0.54 

60.82 

8 

1.48 

0.67 

0.39 

63.36 

9 

1.41 

0.51 

0.35 

65.64 

10 

1.19 

0.38 

0.33 

67.54 


41.09 

13.17 

13.28 

67.54 


Model 2. Second, we fit model (8) using i equal to a visit-specific EDSS 
score. Again, Z^j i’s were normalized to have sample mean 0 and sample vari¬ 
ance 1. Table 3 reports percentages explained by the first 10 eigenimages in 
model 2. Interestingly, the total variation explained by the random intercept 
and random slope in both models is approximately the same, with 56.0% 
in model 1 vs. 54.2% for model 2. However, the random slope in model 2 
explains a much higher proportion of the total variation: 13.2% in model 2 
using EDSS versus model 1 using time. The second component of the ran¬ 
dom slope explains almost 8.5% of the total variation. We have also explored 
whether the scores of W t j (u) depend on MS type and EDSS score using the t- 
test, the Mann-Whitney-Wilcoxon-rank test, and the Kolmogorov-Smirnov 
test. For the first eigenimage, the SP type was significantly different from 
the RR (p -values <0.01 for all three tests), though it was not significantly 
different from the PP group. For the second eigenimage, the distribution of 
eigenscores for the SP type was significantly different from that of the scores 
for the RR (p -values < 0.05 for all three tests), and not significantly different 
from the distribution of the scores of the PP type. For grouping according to 
EDSS score, the distributions of the eigenscores of the first two eigenimages 
have been found to be statistically different (p-values < 0.01 for all three 
tests). Figure 9 displays beanplots similar to Figure 8 for the distributions 
of the scores in the groups defined by MS types and EDSS. This indicates 
that the deviation process Wij(v ) in models 1 and 2 carries not only useful 
but also almost identical remaining information regarding severity of MS. 

6. Discussion. The methods developed in this paper increase the scope 
and general applicability of LFPCA to very high-dimensional settings. The 
base model decomposes the longitudinal data into three main components: 
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EDSS < 5 EDSS >= 5 
EDSS score 



Fig. 9. Model 2: Group beanplots according to MS type (top) and according to EDSS 
score (bottom). 

a subject-specific random intercept, a subject-specific random slope, and re¬ 
versible visit-to-visit deviation. We described and addressed computational 
difficulties that arise with high-dimensional data using a powerful approach 
referred to as HD-LFPCA. We have developed a procedure designed to iden¬ 
tify a low-dimensional space that contains all the information for estimating 
of the model. This significantly extended the previous related efforts in the 
clustered functional principal components models, MFPCA [Di et al. (2009)] 
and HD-MFPCA [Zipunnikov et al. (2011a)]. 

We applied HD-LFPCA to a novel imaging setting considering DTI and 
MS in a primary white matter structure. Our investigation characterized 
longitudinal and cross-sectional variation in the corpus callosum. 

There are several outstanding issues for HD-LFPCA that need to be ad¬ 
dressed. First, a key assumption of our methods is that they require a mod¬ 
erate sample size that does not exceed ten thousands, or so, images. This 
limitation can be circumvented by adopting the methods discussed in the 
Appendix. Second, we have not formally included white noise in our model. 
Simulation studies in Section 4 demonstrated that a moderate amount of 
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white noise does not have a serious effect on the estimation procedure. How¬ 
ever, a more systematic treatment of the related issues is required. 

In summary, HD-LFPCA provides a powerful conceptual and practi¬ 
cal step toward developing estimation methods for structured ultrahigh- 
dimensional data. 


APPENDIX 

A.l. Large sample size. The main assumption which has been made in 
the paper is that the sample size, n = Jj, is sufficiently small to guar¬ 
antee that calculations of order 0(n 3 ) are feasible. Below we briefly describe 
how our framework can be adapted to settings with many more scans—on 
the order of tens or hundreds of thousands. 

LFPCA equation (4) models each vector Y^ as a linear combination 
of columns of matrices 3> A ’°, <&" . Assuming that 2Nx + Nw < 

n, each Ybelongs to an at most ( 2Nx + IVty)-dimensional linear space 
£(<j>Y0, cj>w ) spanned by those columns. Thus, if model (4) holds ex¬ 
actly the rank of the matrix, Y does not exceed (2 Nx + Nw) and at most 
2Nx + N\y columns of V correspond to nonzero singular values. This im¬ 
plies that the intrinsic model (7) can be obtained by projecting onto the 
first 2 Nx + Nw columns of V and the sizes of matrices A a, °, A a -’ 1 , A" in 
(7) will be (2 Nx + Nw) x Nx, (2 Nx + Nw) x Nx, and (2 Nx + Nw) x Nw, 
respectively. Therefore, the most computationally intensive part would re¬ 
quire finding the first 2 Nx + Nw left singular vectors of Y. Of course, in 
practice, model (4) never holds exactly. Hence, the number of columns of 
matrix V should be chosen to be large enough to either reasonably exceed 
(2 Nx + Nw) or to capture the most variability in data. The latter can be es¬ 
timated by tracking down the sums of the squares of the corresponding first 
singular vectors. Thus, this provides a constructive way to handle situations 
when n is too large to calculate the SVD of Y . 

There are computationally efficient ways to calculate the first k singular 
vectors of a large matrix. One way is to adapt streaming algorithms [Weng, 
Zhang and Hwang (2003), Zhao, Yuen and Kwok (2006), Budavari et al. 
(2009)]. These algorithms usually require only one pass through the data 
matrix Y during which information about the first k singular vectors is 
accumulated sequentially. Their complexity is of order 0(k 5 p). An alternate 
approach is to use iterative power methods [see, e.g., Roweis (1997)]. As the 
dimension of the intrinsic model, 2 Nx + Nw, is not known in advance, the 
number of left singular vectors to keep and project onto can be adaptively 
estimated based on the singular values of the matrix Y. Further development 
in this direction is beyond the scope of this paper. 
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A.2. Proofs. 


Proof of Lemma 1. Using the independence of Y, and Y&, the ex¬ 
pectation of pairwise quadratics is 


(13) 


E(Y in Y' kj2 ) 

{ w\ if k / i, 

nn' + K 00 -I- T ■ K 01 -I- T ■ K 10 -I -T T K 11 4- ft ■ K w 
'I'I w -t- i lj2 r^ x -t- -i- ±i n ± l32 r^ x o 3lj2 is. , 

if i = k, 


where dj 1 j 2 is 1 if j\ = j 2 and 0 otherwise. From the top equality we get 
the MM estimator of the mean, r) = n — j^ij- Th e covariances K A 

and K" can be estimated by de-meaning Yy as Y tJ = Y^ — f) and re¬ 
gressing Y ?J1 Y( J2 on 1 ,T ij21 Ti jl ,Ti jl T ij2 , and d JU2 . The bottom equality 

can be written as E (Y^(^ j = -U-* f/ /1 * where Y ihh = Y b2 ® Y iji is a 
p 2 x 1 dimensional vector, the parameter of interest is the p 2 x 5 matrix 
K 1 -' = [vec(K^), vec(K^), vec(K3P),vec(K^), vec(K u )], and the covariates 
are entries in the 5x1 vector fij 1J2 = (1, T iyi , Tjj 1 , Tjj, T iyi . 8j VJ2 )'. With this 
notation EY V = K”F, where Y v is p 2 x m dimensional with m = J 2 
and F is a 5 x m dimensional matrix with columns equal to f,y, j 2 , i = 1,..., I 
and ji,j 2 = 1, • • ■, Ji- The MM estimator of K v is thus K 77 = Y^FF 7 )” 1 , 
which provides unbiased estimators of the covariances K A and K w . If we 
denote H = F^FF 7 )^ 1 , we get the result of the lemma. □ 


Proof of Lemma 2. Let us denote by and the matrices de¬ 
fined by equations (5) with S’^U^U'^S 1 / 2 substituted for Y^Y^. The 
2 n x 2 n dimensional matrix and the nx n dimensional matrix are 
low-dimensional^counterparts of K A and K ll/ , respectively. Using the SVD 
representation Yjj = VS 1,/2 Uij, the estimated high-dimensional covariance 
matrices can be represented as K A = DK^D' and K" = VKj) V', where 
the matrix D is 2p x 2 n dimensional with orthonormal columns defined as 


(14) 


D 


V 0 


J pxn 


pxn 

v 


From the constructive definition of H, it follows that the matrices and 
K{) are symmetric. Thus, we can construct their spectral decompositions, 
K x a = A a A A x ' and K)) t = A" A A" '. Hence, high-dimensional covari- 
ance matrices can be represented as K v = DA V A A a D 7 and K" = 
VA" A A" V 7 , respectively. The result of the lemma now follows from 
the orthonornrality of the columns of matrices D and V. □ 
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Proof of Lemma 3. With notational changes, the proof is identical to 
the proof of Lemma 1. □ 

Proof of Lemma 4. With notational changes, the proof is identical to 
the proof of Lemma 2. □ 

Proof of Lemma 5. The main idea of the proof is similar to that of 
Zipunnikov et al. (2011a). We assume that function rj(v.Tjj) = 0. From the 
model it follows that oj* ~ (0, A u ), where is a covariance matrix of ctq. 
When p < Nx + JiNw the BLUP of uij is given by = Cov(a^j, vec(Yj)) x 
Var(vec(Yj)) _1 vec(Yj) = vec(Yj) [see McCulloch and 

Searle (2001), Section 9]. The BLUP is essentially a projection and, thus, it 
does not require any distributional assumptions. It may be defined in terms 
of a projection matrix. If and Cij are normal, then the BLUP is the best 
predictor. When p > Nx + JiNw the matrix BjA^B' is not invertible and 
the generalized inverse of B/A^B' is used [Harville (1976)]. In that case, 
ch* = A^B^BjA^B') - vec(Y,;) = A^/ 2 (A^/ 2 B'BjA^/ 2 )“ 1 A& 2 B( vec(Y j) = 
(B'Bi) _1 B'vec(Yi). Note that it coincides with the OLS estimator for cj* 
if u>i were a Hxed parameter. Thus, the estimated BLUPs are given by 
Wj = (B(Bj) _1 B( vec(Yj). □ 
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SUPPLEMENTARY MATERIAL 

Supplement to “Longitudinal high-dimensional principal components anal¬ 
ysis with application to diffusion tensor imaging of multiple sclerosis” (DOI: 
10.1214/14-AOAS748SUPP; .pdf). We provide extra figures and tables sum¬ 
marizing the results of simulation studies and the analysis of DTI images of 
MS patients. 
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